-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deploy copy-on-write for more efficient messaging #38
Conversation
Benchmarking Broker revealed high copying overhead, because Broker internally copies topics and data frequently the streams between actors. Using a copy-on-write tuple over topic and data instead of a pair avoids copies as long as messages are accessed as read-only.
A quick follow-up on the serialization/deserialization topic. We're already working on improving the performance on CAF applications by pushing deserialization out of the critical path in the I/O loop: actor-framework/actor-framework#821. Simply switching to the CAF topic branch is not enough, though, since Broker gives the scheduler too few threads to play with. However, I ran a quick experiment by recompiling against the [scheduler]
max-threads=4
[middleman]
num-workers=2 Here's what I get:
Throughput around 100k messages per second with two relays. Almost double what I get with current CAF+Broker |
Looks promising. The API breakage is unfortunate, but hard to avoid; and overall it makes the differentiation between the different message types more clear. Can we move the accessors for topic and data into methods? Would need a new class deriving from |
I don't like the API breakage in principle and seems technically possible to avoid, but it's also the main entry points of the I can work on finalizing a merge this week or next. |
Thanks for the PR, I updated the Python bindings and merged to |
Related to the changes from zeek/broker#38
Sorry for the silence on my end. I was ill last week and I'm still catching up. I'm glad that you share the view that a clean break now is better than accumulating inefficient APIs.
It's already merged, but I still would like to share some thoughts. The C++ community as a whole is moving towards more free functions (e.g., see Plus consistency, as pointed out by @jsiwek. 🙂
Thanks a lot! |
Thanks for the shared thoughts, I had some additional ones just for completeness:
|
This is a breaking API change.
Sorry for the long delay. I had this in the pipeline for quite a while now, but I hadn't the cycles since we were working non-stop on releasing our first version for Tenzir.
Summary
One of the outcomes of the performance evaluation we did for BroCon 2018 was that Broker wastes a lot of cycles by making unnecessary copies. This is because CAF assumes that individual elements in a stream are cheap to copy but
broker::data
is not.To fix this, we wrap the topic/data pairs that travel through Broker's pub/sub channels into copy-on-write optimized tuples. This turns the previous deep copies for each individual pair into a simple reference increase.
Is it worth it? I think it's a crucial step into the right direction and drastically improves performance for multiple local subscribers, but Broker's performance in a distributed setup isn't affected as much as I'd hoped.
API Changes
From a user-perspective, Broker now returns a
data_message
(please see the newmessage.hh
header) wherever it used to return astd::pair<topic, data>
. I've added convenience functions to interact with the new COW-tuple in a more descriptive manner than usingget<0>(...)
andget<1>(...)
:Please note that I did not look into the Python bindings. Maybe the C++ changes can even be hidden in the Python API without loss of efficiency.
Evaluation
I've used my personal MacMini for the evaluation (Late 2012, 4 cores), so by all means please feel free to re-run this on more sophisticated hardware.
broker-stream-benchmark
This benchmark uses
endpoint::subscribe
andendpoint::publish_all
. It's as close to the underlying stream processing when using theboth
mode to run everything in one process. Hence, this benchmark should show the biggest improvement.First up is
master
, hitting CTRL+C after the first 10 messages.106k to 198k messages per second.
We can see that the values fluctuate more heavily this time around. However, we get 790k to 3.4M messages per second out of Broker this time. The worst case of the new implementation is a 4x speedup over the previous best case.
BroCon18 Throughput Setup
I've also revisited the BroCon18 setup for the throughput measurement. The scripts can be found in our events repository. I tweaked it slightly to stop at 1k payload and only used 1-2 nodes.
This time around, we are measuring Broker in a distributed setup:
Publisher -> Node_1 -> ... -> Node_n -> Subscriber
with varying payloads. Because we are dealing with serialization and deserialization this time around, we won't see a similar jump in performance.At BroCon, we've seen that serialization overhead starts dominating at 1k payloads. So I've only looked at payloads of 100.
First up is
master
again. At the beginning of each run there's a couple of 0 entries in the CSV entries, so we take the last 10 measurements.42k to 55k messages per second with two nodes between Ping and Pong. How is the topic branch performing?
No 4x speedup this time, but 61k to 76k is still an increase of more than 10% over the previous best case. Keep in mind that this is purely based on pruning unnecessary copies.